Mapping and Analysis
Exact Values Versus Full Textβ
- Value of a field in structured data
- Straightforward: A value either matches or it doesn't
Full textβ
- Values used to build an inverted index
- Textual data of natural language
- How relevant is the document to the query
Full text is not the same unstructured data
May is fun but June bores me
Does it refer to months or to people?
Inverted indexβ
I'm just summarising here, you should read the book for the problems that these steps are trying to solve in the elasticsearch book
- Data structure used for fast querying
- Steps for creation
- Perform tokenization and normalization (analysis)
- Tokenization: Splitting into individual terms
- Normalization: Converting tokens into a standard form to improve searchability
- Created sorted list of unique tokens
- List in which document each term appears
- Perform tokenization and normalization (analysis)
Analysis and analyzersβ
Examplesβ
- Built-in
- Standard
- Simple
- Whitespace
- Language
When they are usedβ
- Querying full-text field (analyzer applied to query)
- (Not applied for excat-value field queries)
Specifying analyzersβ
- Done by specifying mapping for fields
Mappingβ
Core simple field typesβ
Type | Description |
---|---|
JSON | type (?) |
boolean | true / false |
long | Whole number |
double | Whole number |
date | valid date string |
string | string |
Fields of type string are by default considered to contain full text => will be analyzed before indexing
String matching attributesβ
index
β
- Controls how the string is indexed
| Value | Description |
| ----- | ----------- |
|
analyzed
| Analyze string then index it | |not_analzyed
| Index field exactly as value is specified | |no
| Don't index at all => field not searchable |
analyzer
β
- For specifying which analyzer to use at search / index time
- Built in:
standard
,whitespace
,english
, ...
Updating a mappingβ
Although you can add to an existing mapping, you canβt change it. If a field already exists in the mapping, the data from that field probably has already been indexed. If you were to change the field mapping, the already indexed data would be wrong and would not be properly searchable.
Creating a mappingβ
PUT /gb
{
"mappings": {
"tweet": {
"properties": {
"tweet": {
"type": "string",
"analyzer": "english"
},
"date": {
"type": "date"
},
"name": {
"type": "string"
},
"user_id": {
"type": "long"
}
}
}
}
}
Modifying a mappingβ
PUT /gb/_mapping/tweet
{
"properties" : {
"tag" : {
"type" : "string",
"index": "not_analyzed"
}
}
}
Testing the Mappigβ
GET /gb/_analyze?field=tweet
Body: Black-cats
Output: black
, cat
GET /gb/_analyze?field=tag
Body: Black-cats
Output: 'Black-cats'
Complex Core Field Typesβ
- ES also supports
null
,arrays
, andobjects
Multivalue Fieldsβ
- No special mapping required for arrays
- All values must be of same datatype
- Array retrieved will be same order as when the document was indexed
Empty fieldsβ
- Not indexed
- Lucene does not support storing
null
values - All stored as empty values
null
,[]
,[ null ]
Multilevel Objectsβ
- ES detect new object fields dynamically and map them as type
object
- Each inner field listed under
properties
user
, tweet
and name
are all objects
{
"gb": {
"tweet": {
"properties": {
"tweet": { "type": "string" },
"user": {
"type": "object",
"properties": {
"id": { "type": "string" },
"gender": { "type": "string" },
"age": { "type": "long" },
"name": {
"type": "object",
"properties": {
"full": { "type": "string" },
"first": { "type": "string" },
"last": { "type": "string" }
}
}
}
}
}
}
}
}
How objects are indexedβ
- Lucene doesn't understand inner objects
- Lucene document consists flat list of key-value pairs
Flattened document
{
"tweet": ["elasticsearch", "flexible", "very"],
"user.id": ["@johnsmith"],
"user.gender": ["male"],
"user.age": [26],
"user.name.full": ["john", "smith"],
"user.name.first": ["john"],
"user.name.last": ["smith"]
}
Arrays of inner objectsβ
Indexed document
{
"followers": [
{ "age": 35, "name": "Mary White" },
{ "age": 26, "name": "Alex Jones" },
{ "age": 19, "name": "Lisa Smith" }
]
}
Flattened document
{
"followers.age": [19, 26, 35],
"followers.name": ["alex", "jones", "lisa", "smith", "mary", "white"]
}